AITopics | relative positional information

Collaborating Authors

relative positional information

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech

Cho, Deok-Hyeon, Oh, Hyung-Seok, Kim, Seung-Bin, Lee, Seong-Whan

arXiv.org Artificial IntelligenceOct-20-2025

Cross-speaker emotion transfer in speech synthesis relies on extracting speaker-independent emotion embeddings for accurate emotion modeling without retaining speaker traits. However, existing timbre compression methods fail to fully separate speaker and emotion characteristics, causing speaker leakage and degraded synthesis quality. To address this, we propose DiEmo-TTS, a self-supervised distillation method to minimize emotional information loss and preserve speaker identity. We introduce cluster-driven sampling and information perturbation to preserve emotion while removing irrelevant factors. To facilitate this process, we propose an emotion clustering and matching approach using emotional attribute prediction and speaker embeddings, enabling generalization to unlabeled data. Additionally, we designed a dual conditioning transformer to integrate style features better. Experimental results confirm the effectiveness of our method in learning speaker-irrelevant emotion embeddings.

artificial intelligence, emotion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1394

2505.19687

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes

Guo, Zixun, Dixon, Simon

arXiv.org Artificial IntelligenceMay-22-2025

Moonbeam is a transformer-based foundation model for symbolic music, pretrained on a large and diverse collection of MIDI data totaling 81.6K hours of music and 18 billion tokens. Moonbeam incorporates music-domain inductive biases by capturing both absolute and relative musical attributes through the introduction of a novel domain-knowledge-inspired tokenization method and Multidimensional Relative Attention (MRA), which captures relative music information without additional trainable parameters. Leveraging the pretrained Moonbeam, we propose 2 finetuning architectures with full anticipatory capabilities, targeting 2 categories of downstream tasks: symbolic music understanding and conditional music generation (including music infilling). Our model outperforms other large-scale pretrained music models in most cases in terms of accuracy and F1 score across 3 downstream music classification tasks on 4 datasets. Moreover, our finetuned conditional music generation model outperforms a strong transformer baseline with a REMI-like tokenizer. We open-source the code, pretrained model, and generated samples on Github.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.15559

Country:

Europe (1.00)
Asia (0.68)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

Shi, Jingze, Xie, Ting, Wu, Bingheng, Zheng, Chunjun, Wang, Kai

arXiv.org Artificial IntelligenceJun-24-2024

The Transformers (Attention is All You Need (Vaswani et al. 2017)) architecture is popular in modern deep learning language modeling, which can directly capture the relationship between any two elements in a sequence, effectively handle long-distance dependencies, however, the architecture has two main drawbacks. First, when processing long sequences, its self-attention mechanism's quadratic complexity and cache size limit the ability to handle long contexts. Second, Transformer lacks a single summary state, which means that each generated token must compute over the entire context. Meanwhile, the Selective State Model (Mamba (Gu and Dao 2023)) has emerged. Mamba achieves linear scaling of sequence length during training and maintains a constant state size during generation through its selective state update mechanism.

information, matrix, positional information, (15 more...)

arXiv.org Artificial Intelligence

2406.16495

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Research on Named Entity Recognition in Improved transformer with R-Drop structure

Ji, Weidong, Zhang, Yousheng, Zhou, Guohui, Wang, Xu

arXiv.org Artificial IntelligenceJun-14-2023

To enhance the generalization ability of the model and improve the effectiveness of the transformer for named entity recognition tasks, the XLNet-Transformer-R model is proposed in this paper. The XLNet pre-trained model and the Transformer encoder with relative positional encodings are combined to enhance the model's ability to process long text and learn contextual information to improve robustness. To prevent overfitting, the R-Drop structure is used to improve the generalization capability and enhance the accuracy of the model in named entity recognition tasks. The model in this paper performs ablation experiments on the MSRA dataset and comparison experiments with other models on four datasets with excellent performance, demonstrating the strategic effectiveness of the XLNet-Transformer-R model.

information, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.08315

Country: Asia > China > Heilongjiang Province > Harbin (0.05)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relative Positional Encoding

#artificialintelligenceSep-23-2021, 08:46:30 GMT

In this post, we will take a look at relative positional encoding, as introduced in Shaw et al (2018) and refined by Huang et al (2018). This is a topic I meant to explore earlier, but only recently was I able to really force myself to dive into this concept as I started reading about music generation with NLP language models. This is a separate topic for another post of its own, so let's not get distracted. Let's dive right into it! If you're already familiar with transformers, you probably know that transformers process inputs in parallel at once.

information, matrix, relative positional information, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback